true

Background

The Penobscot River in Maine has been the object of environmental concern and focus for years. It is the largest watershed in New England and hosts the largest run of Atlantic salmon on the East Coast. Despite its ecological importance, the watershed has experienced significant degradation due to high levels of mercury contamination by a chemical plant. In 1967, a pharmaceutical company, Mallinckrodt began discharging mercury into the river as part of its production process, leaving 13 tons of Mercury in the river by 1970.

Mercury is a particularly worrisome and challenging contaminant because it does not easily break down. The contamination has had a significant and lasting impact on the surrounding ecosystem and nearby communities. The state of Maine issued an advisory warning pregnant women against eating certain species harvested from the river, and certain lobster and crab fisheries have been shut down.

However, due to the steadfast and tenacious efforts of environmental advocates, this story has a silver lining. Senior attorney Nancy Marks from the NRDC (the Natural Resources Defense Council) filed a lawsuit in 1998, and after two decades of litigation, a major legal breakthrough was achieved. In 2022, the district attorney of Maine settled the long-running case, requiring Mallinckrodt to pay $197 million in remediation for the mercury damage.

Because this watershed has been the subject of extensive study to document the need for remediation, there is considerable data about its sediment history. NOAA has compiled datasets on sediment, soil, and tissue chemistry from various studies of the Penobscot watershed. Using this data, we will examine mercury levels in the sediment.

Research Questions and Hypotheses

Enter questions and hypotheses here

Research Questions:

Question 1. How do mercury contamination levels change over time in the sediments in the Penobscot River? Question 2. Does mercury contamination increase with depth?

Hypothesis 1: Post closure of the site, we would expect that mercury levels in surface sediment will decrease over time and as they get further away from the site due to natural attenuation.

Hypothesis 2: Over time, mercury concentrations have decreased in the Penobscot River due to sedimentation in the river.

Dataset Information

The dataset was generated from NOAA’s DIVER (Data Integration, Visualization, Exploration and Reporting) system, which is part of the NOAA Damage Assessment, Remediation, and Restoration Program (DARRP). The data contains Mercury results from sediment samples collected in the Penobscot River region in Maine from June 1st, 1970 to May 5th, 2021. Data is not available for every month of the years provided, likely due to shifting priorities with monitoring or project specific challenges.

Item Value
Data Source NOAA DIVER
Date Range June 1st, 1970 to May 5th, 2021
Number of Records 28,058
Records with Coordinates 24,501
## Reading layer `Penobscot_River' from data source 
##   `/home/guest/ENV872_EDA_FinalProject/Shapefiles/Penobscot_River.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 1 field
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -7667023 ymin: 5542624 xmax: -7647021 ymax: 5593963
## Projected CRS: WGS 84 / Pseudo-Mercator

Data Exploration

We began by importing the raw NOAA Diver dataset and subsetting it to include only the columns relevant to our analysis, including site location, depth, depth type, chemical type (mercury), and concentration. Next, we flagged all records containing coordinate information and created a new subset consisting only of samples with valid coordinates. This subset was then converted into a spatial dataset for future mapping and spatial analyses.

##  [1] "Station_Group_List"       "Case_Activity"           
##  [3] "Source_Type"              "Data_Classification"     
##  [5] "Site_ID"                  "Data_Source"             
##  [7] "Data_Category"            "Sharing_Status"          
##  [9] "Workgroup"                "Collection_Workplan"     
## [11] "Gulfspill_Workplan_Name"  "LOSDMS_Workplan_ID"      
## [13] "Study_Name"               "Study_ID"                
## [15] "Component_Name"           "Project_Name"            
## [17] "Area_Description"         "DB_Sample_ID"            
## [19] "Matrix_Group"             "Trip_ID"                 
## [21] "File_Collection_ID"       "Station"                 
## [23] "Station_Description"      "Habitat_Type"            
## [25] "Date"                     "Hour_of_day"             
## [27] "Minutes_of_Hour"          "Sample_ID"               
## [29] "Survey_Notes"             "Sample_Notes"            
## [31] "Collection_Matrix"        "Collection_Method"       
## [33] "Sample_Upper_Depth"       "Sample_Lower_Depth"      
## [35] "Sample_Depth_Unit"        "Depth_Category"          
## [37] "Lab_Name"                 "Sample_Delivery_Group"   
## [39] "Analysis_Method"          "Lab_ID"                  
## [41] "Sample_Type"              "Lab_Replicate"           
## [43] "Analysis_Matrix"          "Analysis_Matrix_Detailed"
## [45] "Analysis_Detail"          "Detection_Limit"         
## [47] "Reporting_Limit"          "Measurement_Basis"       
## [49] "Validation_Level"         "PDB_Sample_Details"      
## [51] "TOC_pct"                  "Analysis_Type"           
## [53] "Analysis"                 "ChemCode"                
## [55] "DV_Qual_Reason"           "Qualifier_Code"          
## [57] "Analysis_Result_Unit"     "Location_Geom"           
## [59] "Max_Result_.Raw."         "Start_Latitude"          
## [61] "Start_Longitude"          "End_Latitude"            
## [63] "End_Longitude"
## Rows: 23,224
## Columns: 18
## $ Case_Activity        <chr> "Penobscot River", "Penobscot River", "Penobscot …
## $ Site_ID              <int> 3200, 3200, 3200, 3200, 3200, 3200, 3200, 3200, 3…
## $ Study_Name           <chr> "PR EGAD Fort Point Cove Sed & Tiss 2005", "PR EG…
## $ Area_Description     <chr> "FORT POINT COVE - PBFPFP", "FORT POINT COVE - PB…
## $ Matrix_Group         <chr> "Sediment", "Sediment", "Sediment", "Sediment", "…
## $ Station              <chr> "REP 1_FPC", "REP 2_FPC", "REP 3_FPC", "SBS-403",…
## $ Date                 <date> 2005-11-10, 2005-11-10, 2005-11-10, 1994-08-01, …
## $ Sample_ID            <chr> "3200-24-PBFPFP REP 1", "3200-24-PBFPFP REP 2", "…
## $ Sample_Upper_Depth   <dbl> -9, -9, -9, 0, 0, -9, -9, -9, -9, -9, -9, -9, -9,…
## $ Sample_Lower_Depth   <dbl> -9.00, -9.00, -9.00, 60.96, 60.96, -9.00, -9.00, …
## $ Sample_Depth_Unit    <chr> NA, NA, NA, "cm", "cm", NA, NA, NA, NA, NA, NA, N…
## $ Depth_Category       <chr> "Subsurface Sediment", "Subsurface Sediment", "Su…
## $ Lab_Replicate        <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",…
## $ ChemCode             <chr> "MERCURY", "MERCURY", "MERCURY", "MERCURY", "MERC…
## $ Max_Result_.Raw.     <dbl> 0.23, 0.30, 0.24, 2.50, 22.00, 0.10, 19.00, 5.00,…
## $ Analysis_Result_Unit <chr> "PPM", "PPM", "PPM", "PPM", "PPM", "PPM", "PPM", …
## $ Coordinate_Flag      <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y",…
## $ geometry             <POINT [°]> POINT (-68.815 44.47167), POINT (-68.815 44…
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   -9.000    0.131    0.332    6.789    1.140 6494.000
## [1] "PPM"  "mg/L"
## [1] NA   "cm"

After cleaning the dataset, we plotted sample depths against mercury concentrations, mapped the sampling locations onto the Penobscot River, and examined concentration trends over time. We plotted a reference line representing the mercury PEC, above which harmful effects to sediment-dwelling biota are likely to occur.

A preliminary review revealed a major data gap between 1970 and 2000, leaving insufficient information to conduct a long-term time-series analysis or evaluate the direct effects of the 1967 contamination event. This gap reflects broader historical issues in environmental monitoring and regulatory oversight during that period. To ensure analytical consistency, we restricted our final dataset to samples collected from 2000–2021.

Data Wrangling

Our initial exploration showed that some concentration results were coded as –9, NOAA’s indicator for missing data, so these records were excluded. We also identified two measurement units—PPM and mg/L—but because mg/L is not appropriate for solids and appeared in only one record, that entry was removed. Additionally, some samples had missing or inconsistent depth information, so an average depth value will need to be generated for further analysis.

#filter data before 2000 and remove -9 in results and mg/L units
#also converting sample depths to feet
data_wrangle_3 <- data_wrangle_2 %>%
  mutate(Year=year(Date)) %>%
  filter(Year>(2000)) %>% 
  filter(Max_Result_.Raw. >= 0) %>% 
  filter(Analysis_Result_Unit == "PPM") %>% 
  mutate(Upper_Depth_ft = if_else(Sample_Upper_Depth >= 0, 
                                  Sample_Upper_Depth*0.0328084,
                                  Sample_Upper_Depth),
         Lower_Depth_ft = if_else(Sample_Lower_Depth >= 0,
                                  Sample_Lower_Depth*0.0328084,
                                  Sample_Lower_Depth),
         Avg_Depth_ft = if_else(Upper_Depth_ft >= 0 & Lower_Depth_ft >= 0,
                               (Upper_Depth_ft+Lower_Depth_ft)/2, -9))

#Intersect the samples with the Penobscot River to ensure we are only looking
#at samples in our area of interest.

data_clean <- data_wrangle_3 %>% 
  st_filter(Penobscot_clean) %>% 
  arrange(Max_Result_.Raw.)

#create a summary table

#Spatial Analysis

The analysis begins with orienting the viewers to the Penobscot River in Maine, in relation to mercury concentrations. The map was created using the shapefiles that were read in, and the subsequent spatial dataset subset during the original data wrangling. In the figures you can see that the highest concentrations are centered around the historical site itself.

Map 1 is a map of mercury concentration, in part per million (PPM) as a color gradient in the spatially situated river. As seen the highest values are around 108.63 PPM, which is 100 times that of the PEC.

Map 2 is a zoomed in map of samples around the contamination site.

Map 3 is the spatial distribution of sample points.

Penobscot_Map = mapview(Penobscot_clean,
                        col.regions = "lightblue",
                        alpha.regions = 1,
                        map.types = "CartoDB.Positron",
                        legend = FALSE)

# Map of concentrations across the entire river
Concentration_Map = Penobscot_Map + mapview(data_clean, zcol = "Max_Result_.Raw.",
                                            alpha.regions = 0.5, 
                                            col.regions = colorRampPalette(c("yellow", 
                                                                            "orange", 
                                                                            "red")),
                                            layer.name = "Mercury Concentration (PPM)")


# Map of concentrations near the site
bbox <- st_as_sfc(st_bbox(c(xmin = -68.844, ymin = 44.734, xmax = -68.819,
                            ymax = 44.748), crs = 4269))

data_near_site <- data_clean %>% 
  st_filter(bbox)

Site_Map = mapview(data_near_site, zcol = "Max_Result_.Raw.",
                                            alpha.regions = 0.5, 
                                            col.regions = colorRampPalette(c("yellow", 
                                                                            "orange", 
                                                                            "red")),
                                            layer.name = "Mercury Concentration (PPM)",
                   map.types = "CartoDB.Positron") +
  mapview(Site, col.regions = "black", alpha.regions = 1)

We then ran a multiple linear regression to determine if either variable- time or depth, had an impact on mercury concentration in the river.

#Multiple linear regressions

#Multiple linear regression to see depth  and time

mercury_depthtime_regression <- lm(data=data_clean, 
                               Max_Result_.Raw.~Avg_Depth_ft+ Date)
summary (mercury_depthtime_regression)
## 
## Call:
## lm(formula = Max_Result_.Raw. ~ Avg_Depth_ft + Date, data = data_clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
##  -1.322  -0.695  -0.247   0.184 107.775 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.083e+00  2.969e-01   3.649 0.000265 ***
## Avg_Depth_ft -4.542e-02  1.811e-02  -2.508 0.012162 *  
## Date         -1.030e-05  1.849e-05  -0.557 0.577329    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.425 on 7288 degrees of freedom
## Multiple R-squared:  0.0009819,  Adjusted R-squared:  0.0007078 
## F-statistic: 3.582 on 2 and 7288 DF,  p-value: 0.02788

This shows that at least one predictor is associated with concentration, but it is small. This sets the stage for the rest of our analysis.

Temporal Relationship

This analysis starts with specific wrangling into a month colomn to prepare the data for a time series analysis. Due to logistical and budget constraints, it seems that data was not regularly collected, meaning there are some gaps in the samples collected each month.

#preliminary wrangle for analysis 

MonthlyData <- data_clean %>%
  mutate(Year = year(Date), Month = month(Date)) %>%
  group_by(Year, Month) %>%
  summarize(Mean_Conc = mean(Max_Result_.Raw., na.rm = TRUE), .groups = "drop")

MonthlyData_final <- MonthlyData %>%
  mutate(Date = ymd(paste0(Year, "-", Month, "-01"))) %>%
  select(Date, Mean_Conc)

#time plot

ggplot(MonthlyData_final,
        aes(x = Date, y = Mean_Conc)) +
  geom_line() +
  geom_smooth(method = "lm")+
  labs(
  title= "Mean Monthly Mercury Concentrations over Time (PPM)",
  x = "Date", 
  y = "Mean Monthly Mercury Concentrations (PPM)")+ 
Mytheme
## `geom_smooth()` using formula = 'y ~ x'

#Linear model
mercury_time_regression <- lm(data=MonthlyData_final, 
                               Mean_Conc~Date)
summary (mercury_time_regression)
## 
## Call:
## lm(formula = Mean_Conc ~ Date, data = MonthlyData_final)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2349 -0.5150 -0.1971  0.0655  8.0094 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.0284637  1.6243739  -0.633    0.530
## Date         0.0001307  0.0001061   1.232    0.224
## 
## Residual standard error: 1.298 on 49 degrees of freedom
## Multiple R-squared:  0.03006,    Adjusted R-squared:  0.01026 
## F-statistic: 1.518 on 1 and 49 DF,  p-value: 0.2237
#Time Series Object

f_month <- month(first(data_clean$Date)) 
f_year  <- year(first(data_clean$Date))

MercuryMonthly.ts <- ts(MonthlyData_final$Mean_Conc,
                        start = c(min(f_year), 
                                  min(f_month)),
                        frequency = 12) 

#decomposition

MonthlyData_final_decomp <- stl(MercuryMonthly.ts,s.window = "periodic")
plot(MonthlyData_final_decomp, main = "Monthly Mercury Decomposition")

#MannKendallSeasonal

MonthlyMercury_Seasonal_Trend <- Kendall::SeasonalMannKendall(MercuryMonthly.ts)
summary(MonthlyMercury_Seasonal_Trend)
## Score =  -6 , Var(Score) = 128
## denominator =  84
## tau = -0.0714, 2-sided pvalue =0.59588
# MannKendallnotseasonal
Mercury_Seasonal_Component <- 
MonthlyData_final_decomp$time.series[, "seasonal"]

MonthlyMercury_noseason_ts <- MercuryMonthly.ts - Mercury_Seasonal_Component

Mercury_noseason_trend <- trend::mk.test(MonthlyMercury_noseason_ts)
summary(Mercury_noseason_trend)
##             Length Class  Mode     
## data.name   1      -none- character
## p.value     1      -none- numeric  
## statistic   1      -none- numeric  
## null.value  1      -none- numeric  
## parameter   1      -none- numeric  
## estimates   3      -none- numeric  
## alternative 1      -none- character
## method      1      -none- character
## pvalg       1      -none- numeric

We first plotted a graph of merucry concentrations as a function of time. We see an unexpected peak in 2021. Next, we ran a simple linear regression, and found a non statistically significant p-value. Because we did not find a linear trend, so we ran the seasonal Mann Kendall test to determine whether there is a monotonic trend that fluctuates with seasonal patterns, we obtained a p-value of 0.596. This is not a significant p-value, indicating that there is no statistically significant trend in the seasonal data. We then ran a non-seasonal trend test, which also produced a non-significant p-value. Together, these results show that there is no temporal trend in contamination levels in the Penobscot River. Therefore, we must reject our first hypothesis in its entirety.

#Depth How do Mercury levels change, based on depth?

The following graph reflects the mercury concentrations from 1970 - 2021 by depth type (surface sediment or subsurface sediment). As seen in the graph, the 1970 has more surface sediment mercury concentrations which reflects the proximity to the initial pollution. Whereas later decades have more subsurface sediment mercury concentrations.

DepthPlot.data <- data_clean %>%
  na.omit() %>%
  filter(Avg_Depth_ft>=0)

#Plot of concentrations vs time by depth type.
Depthdata<-data_clean %>% 
  na.omit()

ggplot(Depthdata,
       aes(x=Date, 
           y=Max_Result_.Raw., 
           color = Depth_Category))+
    geom_point(alpha = 0.7) + 
  labs(
    title = "Mercury Concentration vs. Average Depth",
    x = "Date",
    y = "Mercury Concentration (PPM)") +
  Mytheme

#simple plot
ggplot(DepthPlot.data,
       aes(x = Avg_Depth_ft, y = Max_Result_.Raw.)) +
  geom_point(alpha = 0.7, color = "darkblue") + 
  labs(
    title = "Mercury Concentration vs. Average Depth",
    x = "Average Sample Depth (ft)",
    y = "Mercury Concentration (PPM)"
  ) +
  Mytheme

#top two feet because that is the most biologically active zone 
DepthPlot.data_bioactive <- DepthPlot.data %>% 
  filter(Avg_Depth_ft <= 2)

ggplot(DepthPlot.data_bioactive,
       aes(x = Avg_Depth_ft, y = Max_Result_.Raw.)) +
  geom_point(alpha = 0.7, color = "darkblue") + 
  labs(
    title = "Mercury Concentration vs. Average Depth",
    x = "Average Sample Depth (ft)",
    y = "Mercury Concentration (PPM)"
  ) +
  Mytheme

#Linear regression to see depth

mercury_depth_regression <- lm(data=DepthPlot.data, 
                               Max_Result_.Raw.~Avg_Depth_ft)
summary (mercury_depth_regression)
## 
## Call:
## lm(formula = Max_Result_.Raw. ~ Avg_Depth_ft, data = DepthPlot.data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
##  -0.967  -0.674  -0.259   0.164 107.786 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.97227    0.03622  26.847  < 2e-16 ***
## Avg_Depth_ft -0.10488    0.02714  -3.864 0.000112 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.44 on 7181 degrees of freedom
## Multiple R-squared:  0.002075,   Adjusted R-squared:  0.001936 
## F-statistic: 14.93 on 1 and 7181 DF,  p-value: 0.0001124

In our initial analysis of the first plot, we observed that subsurface sediment generally showed higher mercury concentrations over time. In the next figure, we found that mercury concentrations were primarily aggregated within the top 0–2 feet of depth. Because this upper zone is the most biologically active, we created another plot isolating only the top two feet. Even after this refinement, no significant temporal trend was detected.

To further assess the potential relationship between mercury concentration and depth, we conducted a linear regression analysis. Although the data did not visually suggest a trend, the regression results indicated evidence of a linear relationship, as the p-value was statistically significant (p < 0.05). However, the adjusted R-squared value of 0.001936 showed that the relationship is extremely weak.

Therefore, although a statistically detectable linear relationship exists, depth cannot be considered a meaningful predictor of mercury concentration because the effect size is negligible. We must conclude that our results are inconclusive, meaning we cannot confidently determine whether our hypothesis is true or false.

Conclusion

#Spearman Rho because there is no seasonality, it is non-parametric and missing
#data is allowed.

Next steps would be to run this with more comprehensive data.